76 research outputs found

    Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Get PDF
    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach

    Retrocopy contributions to the evolution of the human genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Evolution via point mutations is a relatively slow process and is unlikely to completely explain the differences between primates and other mammals. By contrast, 45% of the human genome is composed of retroposed elements, many of which were inserted in the primate lineage. A subset of retroposed mRNAs (retrocopies) shows strong evidence of expression in primates, often yielding functional retrogenes.</p> <p>Results</p> <p>To identify and analyze the relatively recently evolved retrogenes, we carried out BLASTZ alignments of all human mRNAs against the human genome and scored a set of features indicative of retroposition. Of over 12,000 putative retrocopy-derived genes that arose mainly in the primate lineage, 726 with strong evidence of transcript expression were examined in detail. These mRNA retroposition events fall into three categories: I) 34 retrocopies and antisense retrocopies that added potential protein coding space and UTRs to existing genes; II) 682 complete retrocopy duplications inserted into new loci; and III) an unexpected set of 13 retrocopies that contributed out-of-frame, or antisense sequences in combination with other types of transposed elements (SINEs, LINEs, LTRs), even unannotated sequence to form potentially novel genes with no homologs outside primates. In addition to their presence in human, several of the gene candidates also had potentially viable ORFs in chimpanzee, orangutan, and rhesus macaque, underscoring their potential of function.</p> <p>Conclusion</p> <p>mRNA-derived retrocopies provide raw material for the evolution of genes in a wide variety of ways, duplicating and amending the protein coding region of existing genes as well as generating the potential for new protein coding space, or non-protein coding RNAs, by unexpected contributions out of frame, in reverse orientation, or from previously non-protein coding sequence.</p

    The UCSC Archaeal Genome Browser

    Get PDF
    As more archaeal genomes are sequenced, effective research and analysis tools are needed to integrate the diverse information available for any given locus. The feature-rich UCSC Genome Browser, created originally to annotate the human genome, can be applied to any sequenced organism. We have created a UCSC Archaeal Genome Browser, available at , currently with 26 archaeal genomes. It displays G/C content, gene and operon annotation from multiple sources, sequence motifs (promoters and Shine-Dalgarno), microarray data, multi-genome alignments and protein conservation across phylogenetic and habitat categories. We encourage submission of new experimental and bioinformatic analysis from contributors. The purpose of this tool is to aid biological discovery and facilitate greater collaboration within the archaeal research community

    Sustainability? Population Affluence Species Technology

    Get PDF
    Presentation on algae and sustainability of the earth. Discusses the Offshore Membrane Enclosures for Growing Algae (OMEGA)

    GeneHub-GEPIS: digital expression profiling for normal and cancer tissues based on an integrated gene database

    Get PDF
    GeneHub-GEPIS is a web application that performs digital expression analysis in human and mouse tissues based on an integrated gene database. Using aggregated expressed sequence tag (EST) library information and EST counts, the application calculates the normalized gene expression levels across a large panel of normal and tumor tissues, thus providing rapid expression profiling for a given gene. The backend GeneHub component of the application contains pre-defined gene structures derived from mRNA transcript sequences from major databases and includes extensive cross references for commonly used gene identifiers. ESTs are then linked to genes based on their precise genomic locations as determined by GMAP. This genome-based approach reduces incorrect matches between ESTs and genes, thus minimizing the noise seen with previous tools. In addition, the gene-centric design makes it possible to add several important features, including text searching capabilities, the ability to accept diverse input values, expression analysis for microRNAs, basic gene annotation, batch analysis and linking between mouse and human genes. GeneHub-GEPIS is available at http://www.cgl.ucsf.edu/Research/genentech/genehub-gepis/ or http://www.gepis.org/

    Algae Bioreactor Using Submerged Enclosures with Semi-Permeable Membranes

    Get PDF
    Methods for producing hydrocarbons, including oil, by processing algae and/or other micro-organisms in an aquatic environment. Flexible bags (e.g., plastic) with CO.sub.2/O.sub.2 exchange membranes, suspended at a controllable depth in a first liquid (e.g., seawater), receive a second liquid (e.g., liquid effluent from a "dead zone") containing seeds for algae growth. The algae are cultivated and harvested in the bags, after most of the second liquid is removed by forward osmosis through liquid exchange membranes. The algae are removed and processed, and the bags are cleaned and reused

    Forces Shaping the Fastest Evolving Regions in the Human Genome

    Get PDF
    Comparative genomics allow us to search the human genome for segments that were extensively changed in the last ~5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human. These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements are dramatically changed in human but not in other primates, with seven times more substitutions in human than in chimp. The accelerated elements, and in particular the top five, show a strong bias for adenine and thymine to guanine and cytosine nucleotide changes and are disproportionately located in high recombination and high guanine and cytosine content environments near telomeres, suggesting either biased gene conversion or isochore selection. In addition, there is some evidence of directional selection in the regions containing the two most accelerated regions. A combination of evolutionary forces has contributed to accelerated evolution of the fastest evolving elements in the human genome

    The Structure of a Rigorously Conserved RNA Element within the SARS Virus Genome

    Get PDF
    We have solved the three-dimensional crystal structure of the stem-loop II motif (s2m) RNA element of the SARS virus genome to 2.7-Å resolution. SARS and related coronaviruses and astroviruses all possess a motif at the 3′ end of their RNA genomes, called the s2m, whose pathogenic importance is inferred from its rigorous sequence conservation in an otherwise rapidly mutable RNA genome. We find that this extreme conservation is clearly explained by the requirement to form a highly structured RNA whose unique tertiary structure includes a sharp 90° kink of the helix axis and several novel longer-range tertiary interactions. The tertiary base interactions create a tunnel that runs perpendicular to the main helical axis whose interior is negatively charged and binds two magnesium ions. These unusual features likely form interaction surfaces with conserved host cell components or other reactive sites required for virus function. Based on its conservation in viral pathogen genomes and its absence in the human genome, we suggest that these unusual structural features in the s2m RNA element are attractive targets for the design of anti-viral therapeutic agents. Structural genomics has sought to deduce protein function based on three-dimensional homology. Here we have extended this approach to RNA by proposing potential functions for a rigorously conserved set of RNA tertiary structural interactions that occur within the SARS RNA genome itself. Based on tertiary structural comparisons, we propose the s2m RNA binds one or more proteins possessing an oligomer-binding-like fold, and we suggest a possible mechanism for SARS viral RNA hijacking of host protein synthesis, both based upon observed s2m RNA macromolecular mimicry of a relevant ribosomal RNA fold

    The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

    Get PDF
    Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01)Wellcome Trust (London, England) (Grant number WT062023)Wellcome Trust (London, England) (Grant number WT077198
    corecore